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AGTCCCAGACGGGCTTTTCCCAGAGAGCTJVAAAGAGAAGGGCCAGAGAATGTCGTCCCAG 
5 CCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACTCCTATGGCAGCTGGTAC 
ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGCCCTCCTGCCAC 
AGCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGCTGTCAATCCTTGTGCTG 
CTGCTCCTGGCCATGCTGGTGAGGCGGCGCCAGCTCTGGCCTGACTGTGTGCGTGGCAGG 
CCCGGCCTGCCCAGCCCTGTGGATTTCTTGGCTGGGGACAGGCCCCGGGCAGTGCCTGCT 

10 GCTGTTTTCATGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTG 
CGCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGG 
GCCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTGTGGCTGCCTGT 
GCCACGGCTGGCCACACAGCTGCACACCTGCTCGGCAGCACGGTGTCCTGGGCCCACCTT 
GGGGTCCAGGTCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTAGTAC 

15 TCCCTGCTGGCCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCT 
GTGCAGCTGGTGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGC 
AGCTACTCTGAGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTAC 
CACACCTCCAAGCATGGGTTCCTGTCCTGGGCCCGCGTCTGCTTGAGAGACTGCATCTAC 
ACTCCACAGCCAGGATTCCATCTGCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGG 

2 0 ACGGCGATTTACCAGGTGGCCCTGGTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAG 
GTGAGGGCAGGGGTCACCACGGATGTCTCCTACCTGCTGGCCGGGTTTGGAATCGTGCTC 
TCCGAGGACAAGCAGGAGGTGGTGGAGGTGGTGAAGCAGCATCTGTGGGCTCTGGAAGTG 
TGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCAGCTTCCTGGTCCTGATGCGCTCA 
CTGGTGACACACAGGAGCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGT 

2 5 CCCTTGCATCGGAGTGCCCATGCCTCGCGCCAAGCCATATTCTGTTGGATGAGCTTCAGT 

GCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTG 
GGAAGCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 
CTCTTCCGTTCCCTGGAGTCCTGGTGGCCCTTCTGGGTGACTTTGGCCCTGGCTGTGATC 
CTGCAGAACATGGCAGCGCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGGTG 

3 0 ACCAACCGGCGAGTGCTCTATGCAGCCACCTTTGTTCTCTTGCCGCTCAATGTGCTGGTG 

GGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTGTGCCCTCTACAACGGCATCGACCTT 
GGCCAGATGGACCTCAGCCTGGTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTAC 
ACGTACCGAAAGTTCTTGAAGATTGAAGTCAGGCAGTCGCATCCAGCCATGACAGCCTTC 
TGCTCCCTGCTCCTGCAAGCGCAGAGGCTCCTAGCCAGGACCATGGCAGCCCCCCAGGAC 

3 5 AGCCTCAGACCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATG 

GCCAAGGGAGCTAGGGCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCGTACACG 
CTGCTGGACAACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 
GCCCAGCCCTGAGGGCAGGGAAGGTCAAGCCACCTGCCCATCTGTGCTGAGGCATGTTCC 
TGCCTACCATCCTCCTCCCTCCCCGGCTCTCCTCCCAGGATCACACCAGCCATGCAGCCA 

4 0 GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAG 

GGCTCTGCTCCAGGCACTTGGCTATGGGAGAGGGAGCAGGGGTTCTGGAGAAAAAAAGTG 
GTGGGTTAGGGCCTTGGTCCAGGAGGCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTC 
CCTACGCTGGCTCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCGTTCTCTGGAACCACT 
CCAGGCGAGCTCCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCT 
4 5 CACCCCCTCAGCGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGTCCTCTGGC 
CTGCAGGGGAGCCCAAGTCATGACTCAGAGCAGGTCCCACACTGAGCTGCCCACACTCGA 
GAGCCAGATATTTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTC 
CCTGCAATAAACTTGTTCCTGAGAAAAAAAAAAAAAAAAAAAJU^AA^^ 
AAAA^AAAAAAAAAAAAAAJ^AAAAAAAA^^ 

SO 
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HSSQPAGNQTSPGATEDYSYGSWYIDBPQGGEELQPBGEVPSCHTSIPPGLYHJ\CI>ASI>S 
ILVLLLLAMLTORRQLWPOCTOGRPGLPSPVDFl^GDRPJU^VPAAVFMV 
EDALPFLTLASAPSQOGKTEAPKGAWKIL.GLFYyAALrYYPLAACATAGHTAAHl.l.GSTLS 
5 WAHLGVQWQRAECP0VPKIYKYYSLLAS1.P1.I,LGLGFL,SLWYPVQL.V]RSFSRRTGAGSK 
GLQSSYSEEYLRNLLCRK1CLGSSYHTSKHGFLSWARVCLRHC1YTPQPGFHLPL.KLVI.SA 
TLiTGTAl YQVALLLLVGWPTI QKVRAGVTTDVSYLIJ^GFGI VI>SEDKQBVArEL^ 
AI^EVCYISAI^VLSCLLTFI.VLMRS1>VTHRTNLRAI.HRGAAI>DLSPI>HRSPHPSRQAIFCW 
MSFSAYQTAF1CI.GX.LVQQIIFFI.GTTAMFLVLMPV1,HGRNLLI.FRSLBSSWPFWI.TIA 
1 O LAVI LQNMAAHOTFI.ETHDGHPQI.TNRR VI.YAATFLi.FPl.lWLVGA^WATV^R VLLSAiYK 
AIHLGQMDi>SLLPPRAATLDPGYYTYRJSIFI.KIEVSOSHPAMTAFCSLLI>QAQSLLPRTMA 
APQDSLRPGEEDEGMQLLQTKDSMAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTALL 
GANGAQP 

Important features of the protein; 
15 Signal peptider 

None 

Transmembrane domain r 

20 

54-69 
102-139 
148-166 
207--222 
25 301-320 
364-380 
431-451 
474-489 
560-535 



30 



35 



50 



Motif file: 

Motif name: N-glycosylat ion site. 
8-12 

Motif namer N-myristoylation site. 



50-56 
176-182 

40 241-247 
317-323 
341-347 
525-531 
627-633 

45 631-637 
640-646 
661-667 



Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 
364-375 

Motif name: ATP/ GTP- binding site motif A (P-loop) . 
55 132-140 
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PRO XXXXXXXXXXXXXXX (Length = 15 amino acids) 

Comparison Protein XXXXXYYYYYYY (Length = 12 amino acids) 

5 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
10 of the PRO polypeptide) = 



5 divided by 15 = 



33.3% 



FIGURE 3B 
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PRO XXXXXXXXXX (Length - 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length - 15 amino acids) 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
10 of the PRO polypeptide) = 



5 divided by 10 = 



50% 
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PRO-DNA NNNNNNNNNNNNNN 
nucleotides) 

5 Comparison DNA NNNNNNLLLLLLLLLL 
nucleotides) 

% nucleic acid sequence identity = 

1 0 (the nnmber of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALlGN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) — 



(Length 14 
(Length = 16 



6 divided by 14 = 42,9% 

15 
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PRO-DNA NNNNNNNNNNNN (Length - 12 niicleotides) 

Comparison DNA NNNNLLLVV (Length = 9 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) 



4 divided by 12 



= 33.3% 



45 



50 
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* C-C irfcreased from 12 to 55 

* Z IS average of EQ 

* B IS average of HD 

* match with stop is _M; slop- stop = 0; J (Joker) maJch = 0 
*/ 

#define JA -S 7* value of a match with a stop 



int 

/* 

/* A */ 
/* B */ 
C V 
/* O */ 

/* F */ 
/* G 
/* H *y 
/* I */ 

n J 

/* K 

L */ 
/* M */ 
/* N */ 
/* O V 

/* P 
/* Q *y 
/* R ^/ 
S */ 
/* T */ 
/* U */ 
/* V V 
/* W */ 
/* X V 
/* Y */ 
/* Z*/ 



z */ 
1). 



_day[26]l26] ^ { 

ABCOEFGH13KLMNOPQRSTUVWXY 

2, 0,-2, 0, 0,-4, 0,-1,-2,-1, 0,_M, U 0,-2, 1,1,0, 0,-6, 0,-3, 

0, 3,^, 3, 2,-5, 0, 1,-2, 0, 0,-3,-2, 2,_AJ,-], J, 0, 0, 0, 0,-2,-5, 0,-3, 
-2,^4J5,-5,-5,-4,-3,~3,-2, 0,-5,-6,-5.-4,_M,-3,-5,-4, 0,-2, 0,-2,-8, 0, 0,-5}, 
3,-5, 4, 3,-6. 1, ],-2, 0, 0,-4,-3, 2,_M,-1 , 2,- 1 , 0, 0, 0,-2,-7, 0,-4, 2}, 
2,-5, 3, 4,-5, 0, 1,-2, 0, 0,-3,-2, 1,_M,-1, 2,-1, 0, 0, 0,-2,-7, 0,-4, 3}, 
-5,-4,-6,-5, 9,-5,-2, 1, 0,-5, 2, 0,-4,_^M,-5,^5,-4,-3,-3, 0,-1, 0, 0, 7,-5}, 

0, -3, 1, 0,-5, 5,-2,-3, 0,-2,-4,-3, 0,_^M,-1, -1,-3, 1,0. 0,-1,-7, 0,-5, 0}, 

1, -3, 1, 1,-2,-2, 6,-2, 0, 0,-2,-2, 2,_M, 0, 3, 2,-1,^1, 0,-2,-3, 0, 0, 2}, 
-2,-2,-2,-2, 1,-3,-2, 5, 0,-2, 2, 2,-2, M, -2,-2,-2,-), 0, 0, 4,-5, 0,-1,-2}, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
-3, 0,-5, 0, 0,-5,-2, 0,-2, 0, 5,-3, 0, 1,_^M,-1, 1, 3, 0, 0, 0,-2,-3, 0,-4, 0}, 
-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3, M, -3,-2.-3,-3,- 1 , 0, 2,-2, 0,-i,-2}, 
-1,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2,_M,-2,- 1 , 0,-2,-1, 0, 2,-4, 0,-2,-1}, 
0, 2,-4, 2, 1,-4, 0, 2,-2, 0, 1,-3,-2, 2,_M,-1, I, 0, I, 0, 0,-2,-4, 0,-2, )}, 

^ ^J"^ ,_M,_M ,_M ,_M ,M ) , 
J, 1,-3,- 1,-3, -5,- J, 0, 2, 0,-1,-3,-2,-1, M, 6, 0, 0, 1, 

0, ],-5, 2, 2,-5,-1, 3,-2, 0, 1,-2,-1, 1,_M, 0, 4, K-1, 

2, 0,-4,-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0,_M, 0, 1 , 6, 0, 

1, 0, 0, 0, 0,-3, 1,-1,-1,0, 0,-3,-2, 1, M, 1,-1, 0, 2, 
1, 0,-2, 0, 0,-3, 0,-1, 0, 0, 0,-1,-1, 0, M, 0,-1,-1, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, O, 0, O, 0, 0, M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
0,-2,-2,-2,-2,-1,-1,-2, 4, 0,-2, 2, 2,-2, M, -1,-2,-2,-1, 0, 0, 4,-6, 0,-2,-2}, 
6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2,-4,-4,^1,-6,-5, 2,-2,-5, 0,-6,17, 0, 0,-6}, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

3, -3, 0,-4,-4, 7,-5, 0,-1, 0,-4,-l,-2,-2,_M, -5,-4,-4,-3,-3, 0,-2, 0, 0,10,-4}, 
0, 1,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1, 1,_M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, A) 



, 0, 0,-1,-6, 0,-5, 0}, 
-1, 0,-2,-5, 0,-4, 3}, 
-I, 0,-2, 2, 0,-4, 0}, 
i, 0,-1,-2, 0,-3, 0}, 
3, 0, 0,-5, 0,-3, 0}, 



55 
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/* 





#i»c}ude <stdjo.h> 






5 


^include <aype.h> 








^define 




)6 


/* max jumps in a d33g 




/^define 


MAXGAP 


24 


/* don't continue to penalize gaps larger than this */ 




^define 


JMPS 


1024 


/* max jmps in an path "^J 


10 


^define 


MX 


4 


/* save if there's at least MX- 1 bases since lasf jmp 




^define 


DMAT 


3 


/* value of matchmg bases */ 




^define 


DMIS 


0 


/* penalty for mismatched bases */ 




^define 


DINSO 


8 


/* penalty for a gap 


15 


^define 


DJNSl 


1 


/* penalty per base */ 




^define 


PINSO 


8 


/* penalty for a gap 




#de/lne 


PINS] 


4 


/* penalty per residue */ 




slruct jmp { 






20 




short 


n(MAXJMP3; /* size of jmp (neg for deiy) */ 



unsigned short ;c [MAX JMP}; 



y^* base no. of jmp in seq x 
/* hmjts seq to 2' 56 - I */ 





struct diag { 






25 


int 


score; 


/* score at last jmp =*/ 




long 


offset; 


/* offset of prey block */ 




short 


ijmp. 


current jmp mdex */ 




struct jmp jp; 


J"* hst of jmps */ 


30 










struct path { 








int 


Spc; 


/* number of leadmg spaces */ 




short 


n{JMPS);/* size 


of jmp (gap) ^/ 




int 


?i[JMPS],/* lac of jwp (Jasi eJem before gap) "^f 


35 


}; 








char 


*of]le; 


output file name */ 




char 


*namexf2]; 


seq names. getseqsO 




char 




/* prog name for err msgs *, 


40 


char 


*5eqx|2J; 


/* seqs: getseqs{) */ 




int 


dmax; 


/* best diag' nv/() 




int 


dmaxO; 


/* final diag */ 




int 


dna; 


/* set if dna: mainO */ 




int 


endgaps; 


/* set if penalizing end gaps 


45 


int 


gapx, gapy; 


/* total gaps in seqs */ 




int 


leiK), lenl; 


/* seq lens */ 




int 


ngapx, ngapy. 


/* total size of gaps */ 




int 


smax; 


/* max score' nwO */ 




int 


*xbm; 


/* bttmap for matching */ 


50 


long 


offset; 


/* current offset in jmp file ' 




struct diag 


*dx; 


/* holds diagonals */ 




struct path 


PP12J; 


/* holds path for seqs */ 




char 


*cailocO, *ma!locO, *indexO^ *strcpyO; 


55 


char 


*getseqO, ^g^calJocQ; 
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/* Needleman-Wonsch arignmem program 



10 



usage: progs filel file2 
where filel and fjle2 are two dna or two protein seqvteiKes, 
The sequences can be m upper- or lower-case an may confain ambiguity 
Any lines beginning with ';','>' or * < * are igrjored 
Max file length iS 65535 (limited by unsigned short x in the jmp struct) 
A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 
Output is in the file "align out'' 



15 



* The program may create a tmp Tile in /tmp to hold info about traceback. 

* Original version developed under BSD 43 on a vax 8650 



^include 
^include 



nw,h" 
day h" 



20 



static dbval[26] { 

1,14.2, 13 A0,4,l 1,0,0, 12,03, 15,0,0,0,5,6.8, 8.7.9,OJO,0 



25 



static _pbva)|26} { 

1, 2|(1< <('D*-'A'))1(1 < <('N'-'A')), 4, 8, 16, 32, 64, 

128, 256, OxFFFFFFF, 1< <10, l<<n, 1< < 12, 1< < 13, 1< < 14, 

1< < 15, 1< < 16, 1< < 17, 1< < 18, K < 19, 1< <20, i < <21, ] < <22, 

1< <23, 1< <24, 1< <25|(1< <('E'-'A'))i(l< <<-Q'-'A')> 



30 



main(ac. 



av) 
int 
char 



mam 



3c; 



35 



40 



45 



prog = avlO]; 
if (ac 3) { 

fprintf{stderr, "usage: %s file! rde2\n'', prog); 

fprinif(siderr, "where filel and fi!e2 are two dna or two protein sequences. \n"); 
fprimf(siderr,"The sequences can be jn upper- or lower-case\n"); 
fprintf(siderr,"Any lines beginning with or ' < ' are ignored\n"); 
fprimf(stderr, "Output is m the file \" align. oui\''\n"); 
exit(l); 

} 

namex[0) — av{l]; 

namexj)] = 3v[2]; 

seqxlO] = getseq(namex[0|, &len0); 

seqx|l] - getseq(n3mex(l], &lenl); 

xbm — (dna)? dbval : jjbval; 



50 



endgaps = 0; 

ofile = **ahgn-Oul"; 



/* J to penalize endgaps 
/* output file */ 



nwQ; fjl! in the matrix, get the possible jmps *I 

readjmpsQ; /* get the actual jmps */ 

printQ; /* prim stats, alignment */ 



55 



cleanup(O); 



/* unlink any imp files *y 
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/* do ihe alignmenU rerorn best score: mainO 

* dna: values in Filch and Sinith, PNAS, 80. 1382^1386, 1983 

* pro: PAM 250 vajyes 

5 * When scores are equal, we prefer mismatches to any gap, prefer 

* a riew gap to exiending an ongoing gap, and prefer a gap m seqx 

* lo a gap m seq y. 
*/ 

10 { 

char *px, *py; /* seqs and ptrs */ 

int *nde]y, *dely; /* Iteep track of dely 

jnt ndeJx, delx; /* keep track of delx */ 

int *imp; /* for swapping rowO. row J */ 

1 5 int mis; /* score for each type *f 

jnt insO, insl; insertion penahies */ 

register id; /* diagonal index */ 

register ij; /* jmp index */ 

regisler =*colO, *co]l; /* score for c«rr, Jast row */ 

2 0 register xx, yy; /* inde;^ mio seqs */ 

dx = (struct diag *)g_ca]Ioc("io gel diags", JenO+Ienl + 1, sizeor(struct diag)); 

ndely = {int *)g_cai}oc("io gel ndely", Ienl + 1, sizeof{int)); 
25 dely = (int *)g_canoc("to get dely"", !enl 4- 1, sizeof(int)); 

colO = (int *)g_caUoc("to get coIO", lenl ^ 1, sizeof()nl)), 
coll ^ (int *)g_c3noc("to get colK, lenl + K sizeof(int)), 
insO - (dna)? DiNSO : PINSO; 
insi = (dna)? DINSl : PINSK 

30 

smax - -KXXX); 
if (endgaps) { 

for (coiO[0) = defyfO) ^ -insO, yy I; yy <= }enl; yy + + ) { 
colOfyyJ = delyfyy] = coIOfyy-]} - insJ; 
35 ndely[>'yj ^ yy; 

) 

colO[0] = 0; /* Waterman Bull MaJh Biol 84 */ 

} 

else 

4 0 for (yy 1; yy < = lenl; yy-t- +) 

delyfyy] — -insO; 

/* fill in match matrix 
*/ 

4 5 for (px = seqxlO], xx = 1; xx < = lenO; px + + , xx-f +) { 

/* initialize first entry in col 
*/ 

if (endgaps) { 

if (XX == i) 

50 collfO] = delx = -(insO + insl); 

else 

col) 10] - delx - colOiO] - insl; 
ndeJx ~ XX ; 



> 

55 else I 



colUO] - 0; 
delx = -insO; 
ndelx = 0; 
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15 



20 



25 



35 



40 



45 



60 



for(py = seqxll],yy = 3;yy < = lenl; py+ yy+ +> { 
mis = coJOIyy-J]; 

^'^"^'^ mis + = (xbml*px-'Al&;.bmrpy-'A'])? DMAT : VMJS; 
else 

mis _dayrpx--A*]f*py-'A-]; 



2 Q /* update penalty for del in x seq; 

* favor new de! over oi^gong del 



^ ignore MAXGAP if weighting endgaps 
*/ 

if (endgaps 1 1 ndejy[yy] < MAXGAP) { 

if (coJOIyy] - insO > = de!y[yy]) { 

dely[yy] - colO[y^'] - (insO+insl); 
ndelylyy] = U 

} else { 

delylyy] insl; 
ndelylyy] + + ; 

} 

} ^fee { 

if (colOlyy] - (insOH-insl) > = dely[yy]) { 

delylyy] = coiO|yy] - (insO+iosl), 
ndelylyy] = 1; 

} else 

ndelylyy] + + ; 

} 



2Q }* update penalty for dc! my seq; 

* favor new del over ongong del 



*/ 

if (endgaps | [ ndelx < MAXGAP) { 

if (co!![yy-l] - insO > - delx) { 

delx = colllyy-l] ^ (jnsO + insI); 
ndelx ^ 1; 

} else { 

delx -= msl ; 
ndelx + + ; 

} 

} else { ^ 
if (collfyy-l] - (insO + insl) delx) { 

delx = colllyy-l] - (insO + insl); 
ndelx = 1; 

} else 

ndelx + +; 

} 



/* pick the maximum score; we're favoring 
3 Q * mis over any del and delx over dely 



55 
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id - xx-yy ^ ^enl - 1; 
if (mis > - c5eU && mis > = delylyy]) 
coUlyy] = mis; 
^ else )f (deU > = delylyyl) { 

coHtyyl = 

if(++y MAX3MP){ 
writejinpsOd); 
,j = dx|id] ijmp 0; 
dxUdj-Offsei = offset; 

1 5 offset + = sizeof(struct jmp) + sizeof(offset), 

} 

dxl'id] jp nlijl = n^^e^^' 
dxlid] jp-xlul = 

20 • dxiid] scoie = deix; 



) 

else { 



25 



colHyy] = delylyy]; 
,j = dxljd]-ijnnp; 



35 



if (dxlidl JP nlO] Cana 1 J^^'^l, > = j^^^'^MX) I ! m,s > score . DINSO)) i 

dAl]d].)jmp+ + ; 
ir(_^+i3 MAXJMP){ 
3 O wrnejinp5(!d); 

jj = dxlid] -ymp = 0; 

dxl>dl offset - offset; r.^ff^^iv 
offset sizeof(struct jmp) + si2eof(offset). 

} 

dxlid] jp-nlij] = -ndelyiyyl; 
dxlid] jp-xly] 
dxlid] score = delylyy]; 

4 0 if (XX = = lenO && yy < ^e"^> < 

/* last col 
*/ 

•''*coniyy]-=insO + .nsPOenl-yy); 

4 5 if (coll [yy] > smax) { 

coUIyyl; 
dmax = id; 

} 

if (endgaps && XX < lenO) 

co!Uyy-l) — insO + insinienO-xx); 

if (colllyy-l] > ^"^^^^ ^ 

smax - coUfyy-1); 

^ dmax = id; 

Lp = colO; colO - coll; coU = imp; 

g Q (void) free((char *)nde!y); 

(yojd) free((char *)dely); 4 X 

(void) free((char -)colO);(vaid) ftee((ch3r *)con);} 
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/* 

* prmQ - only routine visibJe outside this module 
* 

* static: 

* geijjiatO ~ trace back best path, count matches: printQ 

* pr alignO - prim ahgninent of described in array pQ: priniQ 

* dumpblockO " dump a block of lines with numbers, stars: pr alignO 

* numsQ - put out a number line: dnmpblockQ 

* puthneO - put out a hne (name, [num], seq, |num]): dumpblockO 

* starsO - -put a line of stars: dumpblockO 

* stripnameQ - strip any path and prefix from a seqname 
*/ 



^include "nw.h 



^i'define SPC 3 
^define P LINE 256 
^define P SPC 3 



extern 
int 

FILE 



_day(26]I26]; 

olen; 

*fx; 



/* maximum output hne */ 

/* space between name or num and seq */ 



/* set output line Jength V 
/* output Ole */ 



printQ 
{ 



print 

int Jx, ly, firstgap, lastgap; /* overlap */ 

if ((fx = fopen(ofde, "w")) = = 0) { 

fprinlf(stderr,"%s: can't write %s\n", prog, ofde); 
cleanup(l); 

} 

fprfnlf(fx, " < first sequence: %s (length = %d)\n", namex{0], lenO); 
iprinlf(fx, " <second sequence: %s (length = %d)\n", namexfj), lenl); 
olen = 60; - ' ' 

Ix JenO; 
ly = lenl; 

firstgap = lastgap = 0; 

if (dmax < lenJ - 1) { /* leading gap in x */ 

pp[0].spc = firstgap lenl - dmax - I; 
!y pp(0},spc; 

} 

else if (dmax > lenl - 1) { /* leading gap in y */ 
pp(l].spc = firstgap - dmax - (lenl - 1); 
Ix = ppflj.spc; 

} 

if (dmaxO < lenO - 1) { /+ trailing gap in x 
lastgap = lenO - dmaxO - 1 ; 
Ix lastgap; 

) 

else if (dmaxO > lenO -!){/* trailing gap in y */ 
iasigap = dmaxO - (lenO - 1>; 
ly-- lastgap; 

} 

getmai(Ix, ly, firstgap, lastgap); 
pr_align0; 
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* If ace back the best path, count matches 
5 static 

getjTtaT(lx, Jy, firstgap, Jastgap) get mat 

int ly; /* "core" (minus encJgaps) */ 

^ int firstgap, Jastgap; /* leading trailing overlap 

^0 mi nm, iO, i J , sizO, sizl ; 

char out?t[32]; 

doub)e pet; 

register nO, nl; 

register char *pO, *pl; 

/* get total matches, score 
*/ 

iO = iJ - sizO = sizl = 0; 
pO = seqx[0] + pp[]j.spc; 
2 0 pi = seqxfl} -f pp[0] spc; 

nO = pp[ 1 ) spc + 1 ; 
nl = pp[0].spc + }; 

Tim = 0; 

2 5 while ( *pO&& *p1 ) { 

if (sizO) { 

pi + -f ; 
nl + 
sizO— ; 

30 } 

else if (sizl) { 



35 } 

else { 



45 



sizl — ; 



if (xbnDf*pO-'A'}&xbm(*pl''A']) 

rtm+ +; 
if (nO+-f pp[0].x[i03) 

sizO - pplO).n(iO-f +); 

if -= pp[}].xni]> 

sizl = pp[1].nlil + +J; 

pO++; 

pin- 



} 



/* pet homology: 

* if penahzing cndgaps, base is the shorter seq 
^ ^ * €:lse, knock off overhangs and take shorter core 

*/ 

if (endgaps) 

Ix = (lenO < lenl)? lenO : leni; 

else 

5^ Ix - (Ix < ly)? Ix : ly; 

pet = 100.*(double)nm/(double)lx; 
fprintf(fx, "Xn-^); 

fprimf(fx, " < %d match %s in an overlap of %d: %.2f percent simdarityXn" 
nm, (rnn 1)? : *^es", Ix, pet); 
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fprintf(fx, " <g3ps m first sequence: %d", gapx); 

if (gapx) { 

(Toid) spnntf(ouix, " (%d %s%s)", 

ngapx, {dna>? "base": "residue", (ngapx ])? "":"s"); 
fprjntf(fx,"%s". outx); 



-.-getmat 



fprjntf(fx, gaps in second sequence: %d", gapy); 

if (gapy) { 

(void) spr)mf(oulx, <%d %s%s)", 

ngapy. (dna)? "base": "residue", (ngapy == 1)? ' 
lprimf(fx," %s", outx); 



} 

jf (dna) 



else 



fprin!f(fx, 

"\n< score: %d (match - %d, mismatch = %d, gap penalty 
smax, DMAT, DMIS, DINSO, DlNSl); 



%d + %d per base)\n" 



fprintf(fx, 

"\n< score: %d (Dayhoff PAM 250 matrix, gap penalty = %d -f %d per residue)\n" 
smax, PINSO, PINSl); 
if (endgaps) 

fprimf{fx, 

"< endgaps penalized, left endgap' %d %s%s, light endgap: %d %s%s\n", 
fnstgap, (dna)'? "base" "residue", (fjrsigap i)"^ - "s", 
lastgap, (dna)? "base" ' "residue", (tastgap I)"^ : "s"); 



else 

} 

static 
static 
static 
static 
static 
static 
static char 
static char 
static char 
static char 



fpnntf(fx, " < endgaps not penalized\n"); 



nm; /* matches in core — for checking 

Imax; /* lengths of stripped file names */ 

ij|2]; /* jmp index for a path */ 

nc|2]; /* number at start of current Unc */ 

Tu[2}i /* current elem number -- for gapping */ 

Si2l2]; 

*ps[23; /* ptr to current element */ 

=*pof2]; /* ptr to next output char slot */ 

out[23[P_LiNE]; /* output line */ 

st3r[P LINE]; /* set by starsO */ 
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/* 

* print alignment of described in struct path ppQ 
*/ 

static 

pralignO 
{ 



int 
int 

register 



nn; 
more; 



/* char count */ 



for (i = 0, Imax - 0; i < 2; i-f +) ? 

nn = stripname(namex[i]}; 
if (nn > imax) 

Imax = nn; 



pr align 



60 



ncli] = 1; 
nifi] = 1; 
sizlij = ij[i} - 0; 
ps[i] = seqxli); 
po[i] = outfi}; 
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for (nn = nm ~ 0, more ~ 1 ; more; ) { 

for (i = more = 0; j < 2; i+ { 
/* 

* do we have more of this sequence? 
*/ 

ir(i*ps[i]) 

conlijiue; 

more+ +; 



...pr align 
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if (ppfiJ-^F^) { leading space */ 

*po[iI++ = ' 
pp(j].spC"; 

} 

else if (sj2[j]) { /* in a gap */ 
*po[j]++ = 
siz[i] -; 



} 

else { 



/* we're putljng a seq element 
*/ 

*pofi] = *psh]; 

if (is]ower(=*ps(i])) 

*ps[j) = ioupper(+psIiJ); 

po[i]++; 
pslO-f + ; 
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r 

* are we at next gap for ihis seq? 
*/ 

if (ni[i] = = ppli] xWAiW { 

* we need to merge all gaps 
ai this location 
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> 



} 

ni[i]+ + ; 



si2|i) - pplilnlij[i]++]; 
while (niljj pplt]-x[ijlj]]) 

siz[i] += pp[i].n[jjit]++]; 



} 

if (+ -t-nn ~~ olen 1 1 !more && nn) { 
dumpblocJcQ; 
for (i = 0; i < 2; i+ +) 
po[i] = out[i}; 

nn — 0; 

} 
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* dump a block of lines, including numbers, stars: pr alignQ 
*/ 

static 

dumpbfockQ 

{ 

register j; 

for (i = 0; i < 2; i++} 
*po|i3- - '\0'; 



dumpblock 
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(void) putc('\n\ f^); 

for <i = 0; i < 2; i+ +) { 

if (*Oi3tIl3 && (*outfj] 1 = 
if (i = = 0) 



n *(pon])!- ' ■)){ 



niin>s(j); 
if (j 0&& *oulll]) 
siarsO; 

puiline(i); 

if (i - = 0&& *oui[1]) 

fprjntf(fx, star); 
if(i J) 

nums(i); 



...dumpblock 
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/* 

* put out a number line: dumpbiockQ 
*/ 

static 

nums(ix) 

int ix; 



{ 



/* index in out[] holding seq line */ 

char nJjnejPLINE); 

register j; 

register char "^^pn, *px, "^py; 

for (pn = n}ine, i 0; i < Imax + P SPC, i+-f , pn-f +) 
*pn = * 

for (i — nc|ix), py — outfix]; *py; py + + , pn~f- + ) { 
if(*py == ^ ' II *py '^') 
*pn — ' '; 



nums 



else { 



if (i%10 = = 0 II (i = = 1 && ncfix] 1= 1)) { 
j ^ (i < 0)? i : i; 
for (px = pn; j; j /= 10, px-) 
*px - j%30 + ^0"; 

if(i < 0) 

*px 

} 

else 

*pn = * *; 
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} 



*pn 



ncfix] = i; 

for {pn = nline; *pn; pn+ +) 
(void) putc(*pn, fx); 
(void) pulcC\n\ fx); 



* put out a line (name, [num], seq, [num)): dumpblockQ 
*/ 

static 



6 0 puiljne(ix) 



putline 



int 
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.-.putline 

ml i; 
S register char *px; 

for (px namexlix], i = 0; *px && *px px+-f-, 3++) 

(void) putc(*px, fx); 
for (; 'i < Imax + P SPC; i+ +) 
10 (void) putcC \ fx); 

/* these count from 1 : 

niO JS current element (from 1) 
* nc|J is number at start of current line 
15 */ 

for (px = oui[jx]; *px; px+ -f) 

(void) puJc(*px&0x7F, fx); 
(void) putc('\n', fx); 
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* put a Hne of siars (seqs always in outlO], ou([l]): dumpblock() 



2 5 static 

starsO 

30 



stars 



int j; 

register char *pO, *pl, cx, *px; 



if <»*out|0} I 1 (*out|0] = = ^ ' && *(pol03) = -'•)[ 
!*outf]] I [ (*ou![l} - = - && *(pon3) = = ' ')) 
return; 
px = star; 

3 5 for (i = Tmax-f P_SPC; i; i--) 

*px+ + = ' *; 



for (pO = om[0], pi = out(l]; *pO && *p1; pO+ + , pl + +) { 
if (jsaipha(*pO) && isalpha(*pl)) { 



if (xbm[+pO 'A')&xbmt*pl 'A')) { 

cx = 
jm)+ + ; 



} 

4 5 else if (Idna && _day[*pO-'AirpI-'Al > 0) 

cx ~ 

else 

cx — ' '; 

} 

5 0 else 

cx — ' 
*px+ + = cx; 

} 

*px++ = '\n'; 
55 *px 



60 
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/* 

* strip path or prefix from pn, return len: pr alignO 
'/ 

5 static 

stripname(pn) Stripname 
char *pn; /* file name (may be path) */ 



10 



a 2 0 

m 

2 5 

a 
m 
m 

h 3 0 

I- 



{ 



register char *px, *py; 



py = 0; 

for (px = pn; *px; px+ +) 
if(*px == •/•) 

py = px + 1; 

15 if(py) 

(void) 5trcpy(pn, py); 
retum(str]en(pn)); 



4 0 



45 



50 



55 



60 



Page 7 of nwprintx 



10 



15 



FIGURE 4N 



/* 

* cJeanttpO — cleanup any tmp Tile 

* getscqO — read in seq, sel dna, len, max!en 

* g C3lioc() -- calIoc() with error checkin 

* readjmpsO ~ get the good jmps, from tmp file if necessary 

* wrj(ejmps() -- wrjie a filled array of jmps to a tmp file; nwQ 

#inc}ude "nw.h*" 
#iRcJude <sys/f)}e-h> 



char ^jname = "/tmp/homgXXXXXX" 

FILE *f}; 

int cleanupO; 

long iseekQ; 



/* tmp file for jmps */ 
/* cJeanup tmp file */ 
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* remove ajiy tmp fjle if we blow 
cleanup(i) 
{ 



mi 

if (0) 

exit(i); 



(void) unljnk(jname); 



cleanup 
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/* 

* read^ return ptr to seq, set dna, len, maxlen 

* skip Imes starhng with ' < or ' > * 

* seq m upper or lower case 
*/ 

char * 

getseq(fjle, ien) 

char *nie; /* file name */ 
iDl *len; /* seq Ien */ 



{ 



char 

register char 
int 

FILE 



line[1024], *pseq; 
*px, *py; 
natgc^ tien; 



getseq 



if ((fp = fopen(fj!e,"r")) = = 0) { 

fprintf(stderr,"%s: can*t read %s\n", prog, file); 
exit(}); 

} 

tIen = natgc = 0; 

>^h)Ie (fgets(line, 1024, fp)) { 

if (*hne 1 1 *rine j| *Iine 

continue; 
for (px = line; *px !— *\n'; px + + ) 

if (isupper(*px) ) | isiower(*px)) 
tlen+ + ; 

} 

if ((pseq = ma]Ioc((unsigned)(tIen + 6))) = = 0) { 

fprintf<stderr,"%s: mallocQ failed to get %d bytes for %s\n", prog, t]en + 6, fde); 
ex)i(l); 

} 

pseqiO) = pseq[l] =^ pseqfZ] = pseq[3] ^ ^W; 
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.getseq 



py =^ pseq -f 4; 
*]en = lien; 
5 rewind(fp); 

wh5!e (fgefs(ljne, 1024, fpj) { 

if eiine = = 1 1 *lme = = ' < ' | | *i^ne 
continue; 

10 for (px = line; *px ; px++) { 

if (isi3pper(*px)) 

*py+ -f = *px; 
else if (jslower(*px)) 

*py + + ^ toupper(*px); 
15 if (mdex("ATGCU"/(py-I))) 

natgc+ + ; 

} 

} 

*py++ AO'; 
2 0 *py = '\0'; 

(void) fclose<fp); 

dna = natgc > (iiejV3); 

return(pseq + 4). 



thar * 

g_canoc(msg. nx, sz) g^calloc 
char *msg; /* program, catling routine "*/ 

int nx, sz; /* number and size of erements */ 



30 { 



char *px, *canocQ; 



if {(px c3i1oc((unsjgned)nx, (unsigned)sz)) =^ ^ 0) { 
if (*msg) { 

3 5 fprintf(stderr, "%s; g callocQ failed %s (n=^ %d, sz= %d)\n", prog, msg, nx, sz); 

exit(I); 

} 

} 

return(px); 

40 } 
/* 

* get final jmps from dxQ or tmp file, set ppO. f^sel dmax: mainO 
*/ 

4 5 readjmpsO readjoipS 

{ 

int fd- I; 

int siz, iO, i1; 

register i, xx; 

50 

if(fj){ 

(void) fciose(j5); 

if ((fd = openOname, O RJOONLY , 0)) < 0) { 

fprimf(stderr, "%s: can't openQ %s\n", prog, jname); 
55 c]eanup(l); 

} 

} 

for (i ^ iO = il = O, dmaxO dmax, xx = lenO; ; i + -f) { 
while <3) { 

60 for (j = dx[dmax]-}jrRp; j > = 0 && dx[dmax3 jp.xO] > = xx; j-) 

; Page 2 of nwsubrx 
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-,.readjmps 

if 0 < 0 <5xldma?^3-0^fset && fj) { 

(void) Iseek(fa, dx[dmax]-Offset, 0); 

(void) read(fd, (char *)&dxldjTi3x) jp, sjzeof(struct jmp)); 

(void) read(fd, (char *)&d?([dmax], offset, s!zeof(dx{dTn3x).offsel)); 

dxidmax} ijmp = 

> 

else 

break; 

> 

if (i > JMPS) { 

fprintf(siderT, "%s: too many gaps in alignmentXn", prog); 
cJeanxjpli); 

} 

if 0 > - 0) { 

stz = dxldmax) jp.nfjli 
XX = dxidmax] jp-x[j); 
dmax + = siz; 

if (siz < 0) { /* gap in second seq */ 

ppn].n[i1] = -sjz; 
XA + = sjz; 

/* jd = xjJi - >^ + Icnl - 1 

PPp] ^^l^I) = ^ " c^fTiax + Icnl - 1; 
gapy + + , 
ngapy -= stz; 
ignore MAX GAP when doing cndgaps */ 

S]z - (-siz < MAXGAP | ] endgaps)? -siz : MAXGAP; 
]l-f + , 

} 

else if (siz > 0) { /* gap in first seq */ 
pplO] n|iO] = siz; 
ppfO] x{tO] = xx; 
gapx+-i-; 
ngapx + = siz; 
ignore MAXGAP when doing endgaps */ 

siz - <si2 < MAXGAP I I endgaps)? siz : MAXGAP; 
>0+ + ; 

} 

} 

else 

break; 

} 

/* reverse the order of jmps 
*/ 

for (j ^ 0, iO--; j < iO; j+ + , iO~) { 

i = pp[0],nU]; PPi03-n[i] = pplO].n[iO]; ppIO].n[iO] - i; 
i = pp[0]-xUl; pp|0]-x[j) - pp[0).x[iO]; pp[0).x|iO) = i; 

} 

for Q = 0. i!-S j < il; il") { 

i = pp[l]-nD]; ppHl nU] - PPUimV. PP[^J-nlii] = K 

i - pp[i]-^yj; vpin = PPiu-^inj; ppni-xiiij = i; 

} 

if (fd > = 0) 

(void) close(fd); 

(void) un]jnk(jname); 
0 = 0; 

offset = 0;}) Page 3 of nwsubr.c 
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* wf he a filled jmp struct offset of the prev one (if any): nwQ 

w!i.ejmps(ix) writejmps 
int ix; 

{ 

char ^mMernpO; 



if (mlaempOname) < 0) { 

fprin!f(stdeiT, " %s: can't mJaempO %s\n", progjname); 
cleanup(l); 

1^ > 

if ((fj = fopenOname, "w")) = = 0) { 

fprintf(siderr, "%s- can't write %s\n", prog, jname); 
exit(1); 

} 

20 } 

(void) fwi)te((char =*)&dx{i;^].jp, si2eof(str«ct jmpX K 0); 
(void) fwriie^char *)&dxjix] offset, sizeof(dx[ix] offset), !, f)); 
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GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTG 
GAAGTGTGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATG 
CGCTCACTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGAC 
TTGAGTCCCTTGCATCGGAGTCCCCATCCCTCCCGGCAAGCCATATTCTGTTGGATGAGC 

1 0 TTCAGTGCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTC 
TTCCTGGGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAAC 
GTCCTGCTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCX 
GTGATCCTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCA 
CAGCTGACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTG 

15 CTGGTGGGTGGCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATC 
GACCTTGGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCGCGGC 
TACTACAGGTACCGAA 
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CACAACCAGCCACCCCTCTAGGATCCCAGCCCAGCTGGTGCTGGGCTCAGAGGAGAAGGC 
5 CCCGTGTTGGGAGCACCCTGCTTGCCTGGAGGGACAAGTTTCCGGGAGAGATCAATTW^G 
GAAAGGAAAGAGACAAGGAAGGGAGAGGTCAGGAGAGCGCTTGATTGGAGGAGAAGGGCC 
AGAGAATGTCGTCCCAGCGAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACT 
CCTATGGCAGCTGGTACATCGATGAGCCGCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGG 
7\AGTGCCCTCCTGCCACACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGC 

10 TGTCAATCCTTGTGCTGCTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTG 
ACTGTGTGCGTGGCAGGCCGGGCCTGCCCAGGCCCCGGGCAGTGCCTGCTGCTGTTTTCA 
TGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTGCCCTTCCTGA 
CTCTCGCCTGAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGGGCCTGGAAGA 
TACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGTGCCACGGCTG 

1 5 GCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTTGGGGTCCAGG 
TCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCC7\AGATCTACAAGTACTACTCCCTGCTGG 
CCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCTGTGCAGCTGG 
TGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGCAGCTACTCTG 
AGG7VATATCTGAGGAACCTCCTTTGGAGG7\AGi^GCTGGG7U\GCAGCTACGACACCTCCA 

2 0 AGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACAGTGCATCTACACTCCACAGC 
CAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGGACGGCGATTT 
ACGAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAGGTGAGGGCAG 
GGGTCACCACGGATGTCTCCTACCTGCTGGGCGGCTTTGGAATCGTGGTCTCCGAGGACA 
AGCAGGAGGTGGTGGAGCTGGTG7\AGCACCATCTGTGGGCTCTGGAAGTGTGCTAGATCT 

2 5 CAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCACTGGTGACAC 

ACAGGACCAACGTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGTCCCTTGCATC 
GGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTGAGTGCCTACCAGA 
CAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTGGGAACCACGG 
CCCTGGCCTTCGTGGTGCTCATGCCTGTGCTCGATGGCAGGAACCTCCTGCTCTTCCGTT 

3 0 CCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATCCTGCAG7y\GA 

TGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTGACC7VACCGGC 
GAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCGCTGAATGTGCTGGTGGGTGCGATAG 
TGGGCACCTGGGGAGTGCTCCTCTCTGCCCTCTAC7y\CGCCATCCACCTTGGCCAGATGG 
ACCTGAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACGCCGGCTACTACACGTACCGAA 

3 5 ACTTCTTGAAGATTGTyVGTCAGCCAGTCGCATCCAGCCATGACAGCCTTCTGCTCCCTGC 

TCCTGC7\AGCGCAGAGCCTCCTACCGAGGACCATGGCAGCCCCCCAGGACAGCCTCAGAC 
CAGGGGAGG7U\GACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATGGCCAAGGGAG 
CTAGGGCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACGCTGCTGCACA 
ACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGTGCCCAGCCCT 

4 0 GAGGGCAGGGAAGGTCAACCCACCTGCCGATCTGTGCTGAGGCATGTTCCTGCCTACCAC 

CTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCAGCAGGTCCTCC 
GGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAGGGCTCTGCTCC 
AGCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 
CCTTGGTCGAGGAGCGAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTCCCTACCCTGGC 

4 5 TCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACGACTCCAGCCCAGCT 
CCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCTCACCCCCTCAG 
CGCCACGGACCTCTCTGGGGAGTGGCCGG7WVGCTCCCGGGCCTCTGGCCTGCAGGGGAG 
CCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCGCACACTCGAGAGCCAGATAT 
TTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTCCCTGCAATAAA 

50 CTTGTTCCTGAGA7VA7y\ 
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FIGURE 1 2B 
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